Week 4
Old Dominion University
\[E[y_t] = \mu \ \forall \ t \in \{0, ..., T+h\}\]
An example of a mean stationary process is white noise: independent and identically distributed process with mean \(\mu\) (usually 0) and variance \(\sigma^2\).
When modeling, OLS assumes a mean stationary process.
Example: suppose we want to look at the impact of a policy on an outcome.
Call:
lm(formula = y ~ I(t >= 70), data = df)
Residuals:
Min 1Q Median 3Q Max
-7.4057 -2.9028 0.3151 2.8224 7.6771
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.0452 0.4380 16.08 <2e-16 ***
I(t >= 70)TRUE 10.1459 0.7867 12.90 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.638 on 98 degrees of freedom
Multiple R-squared: 0.6292, Adjusted R-squared: 0.6255
F-statistic: 166.3 on 1 and 98 DF, p-value: < 2.2e-16
\[\text{var}(y_t) = \sigma^2 \ \forall \ t \in \{0, ..., T+h\}\]
White Noise is also an example of a variance stationary process. In fact, so is the previous time series.
Note that mean stationarity \(\neq\) variance stationarity.
This is an issue when calculating standard errors, which uses the variance of the error term.
OLS assumes the variance of the error is constant and does not vary in any dimension.
\[var(x) = \frac{1}{n}\sum(x-E[x])^2\]
\[var(x) = \frac{1}{n}\sum(x-E[x])(x-E[x])\]
\[cov(x, y) = \frac{1}{n}\sum(x-E[x])(y-E[y])\]
\[cor(x, y) = \frac{cov(x,y)}{\sqrt{var(x)var(y)}}\]
\[\rho(1) = \frac{cov(y_t,y_{t-1})}{var(y)}\]
\[\rho(k) = \frac{cov(y_t,y_{t-k})}{var(y)} = \frac{\gamma(k)}{\gamma(0)}\]
\[Y_t = 2 + .5Y_{t-1} + \epsilon_t\]
auto_cor <- function(vector, lags = 1){
t <- (lags+1):length(vector)
cor(vector[t-lags], vector[t], use="complete.obs")
}
df$y2 <- 0
for(i in 2:100) df$y2[df$t == i] <- 2 + (.5*df$y2[df$t == i-1]) + rnorm(1)
print(paste0("Lag ", c(1, 2, 10), ": ", round(c(Lag1 = auto_cor(df$y2, 1), Lag2 = auto_cor(df$y2, 2), Lag10 = auto_cor(df$y2, 10)), 4), "\n"))[1] “Lag 1: 0.4447” “Lag 2: 0.0864” “Lag 10: -0.0315”
\[Y_t = \beta Y_{t-1} + e_{t}\]
\[Y_t = \beta (\beta Y_{t-2} + e_{t-1}) + e_{t} = \beta^2Y_{t-2} + \beta e_{t-1} + e_t\]
\[\begin{aligned} Y_t &= \beta (\beta (\beta Y_{t-3} + e_{t-2}) + e_{t-1}) + e_{t} \\ &= \beta^3Y_{t-3} + \beta^2e_{t-2} + \beta e_{t-1} + e_{t} \end{aligned} \]
\[Y_t = \sum_{i = 0}^{\infty} \beta^i e_{t-i}\]
What do we think about \(\beta\)?
If \(|\beta| < 1\), then eventually shocks in the past disappear, and we can calculate this sum.
\[E[Y_t] = E[\sum^\infty_{i = 0} \beta^ie_{t-i}] = 0\] \[\text{var}(Y_t) = \text{var}(\sum^\infty_{i = 0} \beta^ie_{t-i}) = \ ... \ = \frac{\sigma^2}{1 - \beta^2}\]
We can also talk about partial autocorrelations.
What if \(|\beta| = 1\)? This is called a Random Walk or Unit Root Process. The equation looks like this: \(Y_t = Y_{t-1} + e_t\).
\[Y_t = Y_0 + \sum_{i=0}^{\infty} e_{t-i}\]
From Hansen: The past never disappears, and shocks are permanent.
To fix a unit root, then the first difference is white noise.
\[\Delta Y_t = Y_t - Y_{t-1} = Y_{t-1} + e_{t} - Y_{t-1}\]
Here, the NULL hypothesis is that there exists a unit root. Rejecting the NULL suggests the process is stationary.
The mean of an AR(1) (without an intercept) is 0. However, the conditional mean is: \(E[Y_t|\Omega_{t-1}] = E[\beta Y_{t-1} + e_t|\Omega_{t-1}] = \beta Y_{t-1}\).
Note that no matter the value of \(Y_{t-1}\), multiplying by \(\beta\) shifts it towards the long term average (0).
The conditional variance is: \(\text{var}(Y_t - E[Y_t|\Omega_{t-1}]|\Omega_{t-1}) = \text{var}(e_t|\Omega_{t-1})\). From here, \(\text{var}(e_t|\Omega_{t-1}) = \text{var}(e_t) = \sigma^2\)
The autocorrelation function of an AR(1) process is: \(\rho(k) = \beta^k\)
What if our AR(1) model has an intercept?
\[Y_t = \alpha + \beta Y_{t-1} + e_t\]
To skip some math:
\[\begin{align} E[Y_t] &= \alpha + \beta \mu \\ &\text{where} \ \mu = \frac{\alpha}{1-\beta} \end{align}\]
\[E[Y_t|\Omega_{t-1}] = \alpha + \beta Y_{t-1}\]
s1 <- data.frame(y = rnorm(100), t = 1:100)
s2 <- data.frame(y = rep(0, 101), t = 0:100)
for(i in 1:100) s2$y[s2$t == i] <- (.1*s2$y[s2$t == i-1]) + rnorm(1)
s3 <- data.frame(y = rep(0, 101), t = 0:100)
for(i in 1:100) s3$y[s3$t == i] <- (.5*s3$y[s3$t == i-1]) + rnorm(1)
s4 <- data.frame(y = rep(0, 101), t = 0:100)
for(i in 1:100) s4$y[s4$t == i] <- (.9*s4$y[s4$t == i-1]) + rnorm(1)
s5 <- data.frame(y = rep(15, 101), t = 0:100)
for(i in 1:100) s5$y[s5$t == i] <- (.9*s5$y[s5$t == i-1]) + rnorm(1)
s6 <- data.frame(y = rep(0, 101), t = 0:100)
for(i in 1:100) s6$y[s6$t == i] <- 0.5 + (.9*s6$y[s6$t == i-1]) + rnorm(1)
s7 <- data.frame(y = rep(0, 101), t = 0:100)
for(i in 1:100) s7$y[s7$t == i] <- (-.9*s7$y[s5$t == i-1]) + rnorm(1)For forecasting, use the n-step method.
\[E[Y_T|\Omega_{T-1}] = \beta Y_{T-1} \implies E[Y_{T+1}|\Omega_{T}] = \beta Y_{T}\]
\[E[Y_{T+2}|\Omega_{T}] = \beta Y_{T+1} = \beta(\beta Y_{T})\]
\[E[Y_T|\Omega_{T-1}] = \alpha + \beta Y_{T-1} \implies E[Y_{T+1}|\Omega_{T}] = \alpha + \beta Y_{T}\]
\[\begin{align} E[Y_{T+2}|\Omega_{T}] &= \alpha + \beta Y_{T+1} \\ &= \alpha + \beta(\alpha + \beta Y_{T}) \\ &= (1+\beta)\alpha + \beta^2Y_{T} \end{align}\]
s5 <- data.frame(y = rep(0, 121), t = 0:120)
for(i in 1:120) s5$y[s5$t == i] <- (.9*s5$y[s5$t == i-1]) + rnorm(1)
s5$horizon <- ifelse(s5$t >= 101, 1, 0)
lim <- s5$horizon == 0
plot(s5$t[lim], s5$y[lim],
xlab = "Time", ylab = "Outcome",
xlim = c(0, 120), type = "l")
abline(h = 0, lty = 2)
reg1 <- arima(s5$y[lim], c(1, 0, 0), include.mean = FALSE)
reg2 <- arima(s5$y[lim], c(1, 0, 0), include.mean = TRUE)
p1 <- predict(reg1, n.ahead = 20)
p2 <- predict(reg2, n.ahead = 20)
lines(s5$t[!lim], s5$y[!lim], col = "tomato", lwd = 2)
lines(p1$pred, col = "dodgerblue")
lines(p2$pred, col = "mediumseagreen")
legend("bottomleft", bty = "n", lty = 1,
legend = c("No Intercept", "Intercept"),
col = c("dodgerblue", "mediumseagreen"))s6 <- data.frame(y = rep(0, 121), t = 0:120)
for(i in 1:120) s6$y[s6$t == i] <- .5 + (.9*s6$y[s6$t == i-1]) + rnorm(1)
s6$horizon <- ifelse(s6$t >= 101, 1, 0)
lim <- s6$horizon == 0
plot(s6$t[lim], s6$y[lim],
xlab = "Time", ylab = "Outcome",
xlim = c(0, 120), type = "l")
abline(h = 0, lty = 2)
abline(h = .5 + .9*(.5/(1-.9)), lty = 3)
reg1 <- arima(s6$y[lim], c(1, 0, 0), include.mean = FALSE)
reg2 <- arima(s6$y[lim], c(1, 0, 0), include.mean = TRUE)
p1 <- predict(reg1, n.ahead = 20)
p2 <- predict(reg2, n.ahead = 20)
lines(s6$t[!lim], s6$y[!lim], col = "tomato", lwd = 2)
lines(p1$pred, col = "dodgerblue")
lines(p2$pred, col = "mediumseagreen")
legend("topleft", bty = "n", lty = 1, horiz = T,
legend = c("No Intercept", "Intercept"),
col = c("dodgerblue", "mediumseagreen"))plot(s6$t[lim], s6$y[lim],
xlab = "Time", ylab = "Outcome",
xlim = c(0, 120), type = "l",
ylim = c(min(p1$pred - 1.645*p1$se,
p2$pred - 1.645*p2$se,
s6$y),
max(p1$pred + 1.645*p1$se,
p2$pred + 1.645*p2$se,
s6$y)))
abline(h = 0, lty = 2)
abline(h = .5 + .9*(.5/(1-.9)), lty = 3)
lines(s6$t[!lim], s6$y[!lim], col = "tomato", lwd = 2)
lines(p1$pred, col = "dodgerblue")
lines(p1$pred + 1.645*p1$se, col = "dodgerblue", lty = 2)
lines(p1$pred - 1.645*p1$se, col = "dodgerblue", lty = 2)
lines(p2$pred, col = "mediumseagreen")
lines(p2$pred + 1.645*p2$se, col = "mediumseagreen", lty = 2)
lines(p2$pred - 1.645*p2$se, col = "mediumseagreen", lty = 2)
legend("topleft", bty = "n", lty = 1,
legend = c("No Intercept", "Intercept"),
col = c("dodgerblue", "mediumseagreen"))plot(s6$t[lim], s6$y[lim],
xlab = "Time", ylab = "Outcome",
xlim = c(0, 120), type = "l",
ylim = c(min(p1$pred - 1.645*p1$se,
p2$pred - 1.645*p2$se,
s6$y),
max(p1$pred + 1.645*p1$se,
p2$pred + 1.645*p2$se,
s6$y)))
abline(h = 0, lty = 2)
abline(h = .5 + .9*(.5/(1-.9)), lty = 3)
polygon(x = c(101:120, 120:101),
y = c(p1$pred - 1.645*p1$se,
rev(p1$pred + 1.645*p1$se)),
border = NA,
col = scales::alpha("dodgerblue", .2))
polygon(x = c(101:120, 120:101),
y = c(p2$pred - 1.645*p2$se,
rev(p2$pred + 1.645*p2$se)),
border = NA,
col = scales::alpha("mediumseagreen", .2))
lines(s6$t[!lim], s6$y[!lim], col = "tomato", lwd = 2)
lines(p1$pred, col = "dodgerblue", lwd = 2)
lines(p2$pred, col = "mediumseagreen", lwd = 2)
legend("topleft", bty = "n", lty = 1,
legend = c("No Intercept", "Intercept"),
col = c("dodgerblue", "mediumseagreen"))\[Y_t = \epsilon_t + \theta \epsilon_{t-1}\] where \(\theta\) describes serial correlation, and \(\epsilon_t\) is white noise.
Mean:
\[E[\epsilon_t + \theta \epsilon_{t-1}] = E[\epsilon_t] + \theta E[ \epsilon_{t-1}] = 0\]
Variance:
\[\text{var}(\epsilon_t + \theta \epsilon_{t-1}) = \sigma^2 + \theta^2 \sigma^2 = (1 + \theta^2)\sigma^2\]
Conditional Mean:
\[E[Y_t | \Omega_{t-1}] = E[\epsilon_t + \theta \epsilon_{t-1}] = \theta \epsilon_{t-1}\] - Therefore, the forecast error (\(Y_t - E[Y_t | \Omega_{t-1}]\)) is: \(\epsilon_t\).
\[\gamma(1) = E[(\epsilon_{t} + \theta \epsilon_{t-1})(\epsilon_{t-1} + \theta \epsilon_{t-2})] \]
Because white noise has no serial correlation…
\[\gamma(1) = \theta \sigma^2\]
Similarly, \(\gamma(k) = 0 \ \forall \ k > 1\).
\[\rho(1) = \frac{\gamma(1)}{\gamma(0)} = \frac{\theta \sigma^2}{(1 + \theta^2) \sigma^2} = \frac{\theta}{1+\theta^2}\]
\[\rho(k) = 0 \ \forall \ k > 1\]
\[\rho(1) \in (\frac{-1}{2}, \frac{1}{2})\]
\[Y_t = e_t + \theta e_{t-1} \implies e_t = Y_t + \theta e_{t-1}\]
\[\begin{align} e_t = Y_t - \theta e_{t-1} \implies &e_{t-1} = Y_{t-1} - \theta e_{t-2} \\ &e_{t} = Y_t - \theta Y_{t-1} + \theta^2 e_{t-2} \\ &e_{t} = Y_t - \theta Y_{t-1} + \theta^2 Y_{t-2} - \theta^3e_{t-3} \end{align}\]
\[\begin{align} Y_t &= \theta Y_{t-1} - \theta^2Y_{t-2} + \theta^3Y_{t-3} + \ ... \ + \ e_{t} \\ &= -\sum_{i = 1}^{\infty} (-\theta)^iY_{t-i} + e_{t} \end{align}\]
Similarly, this only converges if \(|\theta|\) < 1.
See Hansen’s Lecture #9 for the formal recursive forecast.
We will use R’s arima() function as shown above.
AR(1) \(\implies \rho(k) = \beta^k \ \forall \ k\)
MA(1) \(\implies \rho(k) = 0 \ \forall \ k>1\)
s6 <- data.frame(y = rep(0, 121), t = 0:120, shocks = c(0, rnorm(120)))
for(i in 1:120) s6$y[s6$t == i] <- s6$shocks[s6$t == i] + .7*s6$shocks[s6$t == (i-1)]
s6$horizon <- ifelse(s6$t >= 101, 1, 0)
lim <- s6$horizon == 0
plot(s6$t[lim], s6$y[lim],
xlab = "Time", ylab = "Outcome",
xlim = c(0, 120), type = "l")
abline(h = 0, lty = 2)
abline(h = .5 + .9*(.5/(1-.9)), lty = 3)
reg1 <- arima(s6$y[lim], c(0, 0, 1), include.mean = FALSE)
reg2 <- arima(s6$y[lim], c(1, 0, 0), include.mean = FALSE)
p1 <- predict(reg1, n.ahead = 20)
p2 <- predict(reg2, n.ahead = 20)
lines(s6$t[!lim], s6$y[!lim], col = "tomato", lwd = 2)
lines(p1$pred, col = "dodgerblue")
lines(p1$pred + 1.645*p1$se, col = "dodgerblue", lty = 2)
lines(p1$pred - 1.645*p1$se, col = "dodgerblue", lty = 2)
lines(p2$pred, col = "mediumseagreen")
lines(p2$pred + 1.645*p2$se, col = "mediumseagreen", lty = 2)
lines(p2$pred - 1.645*p2$se, col = "mediumseagreen", lty = 2)
legend("bottomleft", bty = "n", lty = 1, horiz = T,
legend = c("MA(1)", "AR(1)"),
col = c("dodgerblue", "mediumseagreen"))| Process | ACF | PACF |
|---|---|---|
| AR(p) | Decaying/Oscillating | 0 for k > q |
| MA(q) | 0 for k > p | Decaying/Oscillating |
| ARMA(p,q) | Decaying/Oscillating after lag max(0, q-p) |
Decaying/Oscillating after lag max(0, p-q) |
Data Cleaning
Data Visualization
Estimate, Interpret, Forecast, and Evaluate:
ECON 707/807: Econometrics II